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Final  Report  for  Contract  F49620-77-C-0037 

1*  Direct  Methods  ^ 

Work  on  direct  sparse  matrix  methods  continued  under  the  grant.  One  of  the 
chief  thrusts  of  our  research  has  been  how  to  use  these  sparse  matrix  techniques 
in  situations  where  primary  memory  is  smaller  than  problem  size. 

Along  with  Andy  Sherman  of  the  Department  of  Computer  Science  at  the 
University  of  Texas,  we  investigated  what  are  called  Minimal  Storage  Methods. 
Rather  than  save  the  factorization  in  auxiliary  storage,  we  throw  away  most 
nonzero  entries  and  recompute  them  as  necessary  during  back-solution  [7,  8]. 
Surprisingly,  for  model  problems,  the  work  required  is  less  than  twice  that  for 
conventional  sparse  elimination,  although  the  bookkeeping  overhead  does  increase 
somewhat . 

We  investigated  the  use  of  secondary  storage  in  conjunction  with  band 
elimination  [16,  11].  This  work  focused  on  trying  to  understand  and 
parameterize  the  general  issues  involved,  designing  and  analyzing  classes  of 
algorithms  that  use  secondary  storage,  implementing  and  benchmarking  these 
algorithms,  and  studying  new  computer  architectures  and  software  systems  that 
would  allow  us  to  use  secondary  storage  more  effectively  to  solve  banded  linear 
systems • 

For  sparse  elimination,  the  straightforward  approach  to  auxiliary  storage 
(forming  the  rows  of  the  factorization  one  at  a  time  while  keeping  the 
previously  computed  rows  in  auxiliary  storage  and  fetching  them  as  needed)  is 
grossly  inefficient:  the  I/O  overwhelms  the  computation.  It  appears,  however, 
that  the  minimal-storage  approach  to  sparse  elimination  [8]  can  be  adapted  to 
auxiliary  storage  and  will  result  in  an  efficient  algorithm  for  solving  very 
large  sparse  systems  of  linear  equations. 
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Our  research  on  iterative  methods  centered  on  multi-grid  iterative  methods 
and  on  preconditioned  conjugate  gradient  and  conjugate  residual  methods  [4,  3]. 
While  the  multi-grid  algorithm  has  been  shown  mathematically  to  be 
asymptotically  optimal  [2]  and  it  well  known  that  preconditioned  conjugate 
gradient  is  not  [4],  recent  empirical  computer  studies  1 5 ]  indicate  that,  for 
problems  of  practical  size,  the  preconditioned  conjugate  gradient  method  is 
surprisingly  competitive*  Moreover,  very  recently  Eisenstat  [12]  showed  how  to 
significantly  speed  up  preconditioned  conjugate  gradient  codes  based  on 
approximate  factorizations,  making  preconditioned  conjugate  gradient  methods 
even  more  competitive. 

We  investigated  extensions  of  many  of  the  ideas  of  preconditioned 
conjugate  gradient  methods  to  the  class  of  nonsymmetric  matrices  with  positive 
definite  symmetric  parts.  Such  matrices  arise,  for  example,  in 
finite-difference  approximations  to  the  convection-conduction  equation  [1].  We 
obtained  a  number  of  startling  empirical  results  [13,  9],  but  while  we  have  some 
new  theory,  we  still  cannot  explain  all  of  the  experiments.  We  obtained  the 
first  convergence  proof  [9]  of  Orthomin  [17],  one  of  the  algorithms  that  appear 
to  be  most  promising  in  practice.  Much  theoretical  and  experimental  work 
remains  to  be  done  in  this  area.  The  surface  has  barely  been  scratched. 

2m  Mathematical  Software 

In  order  to  disseminate  numerical  algorithms  to  the  scientific  community, 
numerical  analysts  must  prepare  veil-documented,  modular,  portable  mathematical 
software  that  implements  these  algorithms.  Otherwise  algorithms  are  either 
ignored  because  they  seem  too  complicated  to  program  or  mis- implemented , 
sometimes  in  grossly  inefficient  ways.  One  of  the  prime  objectives  in  our 
research  has  been  to  implement  the  ideas  we  develop. 
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Our  vork  on  mathematical  software  for  solving  very  large  sparse  systems  of 
linear  equations  focused  both  on  direct  methods,  where  the  major  emphasis  was  on 
adapting  in-memory  techniques  to  situations  with  limited  memory,  and  on 
iterative  methods,  where  the  major  emphasis  was  on  extending  preconditioned 
conjugate  gradient  methods  to  nonsymmetric  systems • 

Along  with  Andy  Sherman  of  the  Department  of  Computer  Science  at  the 
University  of  Texas,  we  developed  a  prototype  code  for  Minimal  Storage  Sparse 
Elimination  [8].  In  tests  against  our  own  classic  Yale  Sparse  Matrix  Package, 
it  proved  to  be  surprisingly  competitive  for  a  simple  model  problem.  The  same 
ideas  used  to  implement  minimal  storage  sparse  elimination  seem  to  apply  to 
adapting  general  sparse  elimination  to  auxiliary  storage  (like  disks)  in  such  a 
way  as  both  to  minimize  I/O  and  to  maximize  the  overlap  of  I/O  and  computation. 

The  straightforward  implementation  of  sparse  elimination  [10,  14]  does  not 
mesh  well  with  the  latest  class  of  super-computer,  the  vector  processor.  Vector 
processors  differ  from  the  more  conventional  scalar  processors  in  their  ability 
to  operate  on  vectors,  sequences  of  contiguous  or  regularly  spaced  memory 
locations,  far  more  efficiently  than  on  the  components  individually.  (Thus  the 
time  to  add  together  two  vectors  of  length  n  would  be  s+tn,  where  s  denotes  the 
startup  time  and  t  («  s)  the  time  per  addition,  whereas  the  time  to  add 
together  two  scalars  would  be  s+t.)  To  take  advantage  of  this  vector  hardware, 
however,  it  is  necessary  to  "vectorize"  the  algorithms  used,  sometimes  replacing 
a  nonvectorizable  one  that  would  run  faster  on  a  scalar  machine  with  a  slower 
but  vectorizable  one.  Unfortunately,  the  innermost  loop  in  sparse  elimination, 

where  the  bulk  of  the  computation  is  done,  is  of  the  form 
DO  1  J-JMIH,JMAX 

1  R0W(JU(J))  -  RW(JU(J))  ♦  UKI*U( J) 
which  involves  a  scatter-fetch  (creating  a  contiguous  vector  from  randomly 
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scattered  memory  locations),  adding  one  multiple  of  a  vector  to  another,  and 
then  a  scatter-store.  Only  the  second  phase  is  vectorizable .  On  the  other 
hand,  the  MSSE  approach  to  sparse  elimination  does  appear  to  vectorize  well  and 
could  run  reasonably  fast. 

We  have  investigated  a  number  of  variants  of  the  multi-grid  approach  for 
solving  finite-difference  approximations  to  linear  boundary-value  problems  for 
elliptic  partial  differential  equations.  To  do  uniform  comparisons,  we  have 
developed  a  package  implementing  multi-grid  in  a  fairly  general  manner  [6]. 

We  investigated  extensions  of  many  of  the  ideas  underlying  preconditioned 
conjugate  gradient  methods  to  the  class  of  nonsymmetric  matrices  with  positive 
definite  symmetric  part*  In  order  to  compare  the  different  iterative  methods 
and  preconditionings  in  a  common  environment,  we  created  a  prototype  package 
that  implements  these  methods  [13],  the  user  interface  being  similar  to  that 
used  in  ITPACK  [15].  As  we  gain  more  experience  about  which  methods  are  most 
effective,  we  hope  to  refine  this  prototype  into  mathematical  software. 
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